Towards Behavior-Aware Model Learning from Human-Generated Trajectories
نویسندگان
چکیده
Inverse reinforcement learning algorithms recover an unknown reward function for a Markov decision process, based on observations of user behaviors that optimize this reward function. Here we consider the complementary problem of learning the unknown transition dynamics of an MDP based on such observations. We describe the behavior-aware modeling (BAM) algorithm, which learns models of transition dynamics from user generated state-action trajectories. BAM makes assumptions about how users select their actions that are similar to those used in inverse reinforcement learning, and searches for a model that maximizes the probability of the observed actions. The BAM algorithm is based on policy gradient algorithms, essentially reversing the roles of the policy and transition distribution in those algorithms. As a result, BAM is highly flexible, and can be applied to continuous state spaces using a wide variety of model representations. In this preliminary work, we discuss why the model learning problem is interesting, describe algorithms to solve this problem, and discuss directions for future work.
منابع مشابه
Apprenticeship Learning About Multiple Intentions
In this paper, we apply tools from inverse reinforcement learning (IRL) to the problem of learning from (unlabeled) demonstration trajectories of behavior generated by varying “intentions” or objectives. We derive an EM approach that clusters observed trajectories by inferring the objectives for each cluster using any of several possible IRL methods, and then uses the constructed clusters to qu...
متن کاملTowards learning movement in dense crowds for a socially-aware mobile robot
Robots moving in a crowd occasionally reach situations where they need to decide whether to give way to a human or not, a situation we call a micro-conflict and model with a two player game. We collect data from a robot controlled by a human operator and use three different supervised learning algorithms (random forest, SVM and neuroevolution) to create a decision maker module which imitates th...
متن کاملInverse Reinforce Learning with Nonparametric Behavior Clustering
Inverse Reinforcement Learning (IRL) is the task of learning a single reward function given a Markov Decision Process (MDP) without defining the reward function, and a set of demonstrations generated by humans/experts. However, in practice, it may be unreasonable to assume that human behaviors can be explained by one reward function since they may be inherently inconsistent. Also, demonstration...
متن کاملسنجش سطح قابلیتهای یادگیری سازمانی در بیمارستانها
In the organizational studies, the measurement of organizational learning capabilities has become an increasingly important area. There are several models in literature that have been generated by statistical data from manufacturing firms. In this paper we have used a structural equation model for measurement of organizational learning in hospitals as services firms. In our model, there are fou...
متن کاملHuman-Mobility-Based Sensor Context-Aware Routing Protocol for Delay-Tolerant Data Gathering in Multi-Sink Cell-Phone-Based Sensor Networks
Ubiquitous use of cell phones encourages development of novel applications with sensors embedded in cell phones. The collection of information generated by these devices is a challenging task considering volatile topologies and energy-based scarce resources. Further, the data delivery to the sink is delay tolerant. Mobility of cell phones is opportunistically exploited for forwarding sensor gen...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016